首页> 外文OA文献 >StructHDP: automatic inference of number of clusters and population structure from admixed genotype data.
【2h】

StructHDP: automatic inference of number of clusters and population structure from admixed genotype data.

机译:StructHDP:从混合基因型数据自动推断簇数和种群结构。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

MOTIVATION: Clustering of genotype data is an important way of understanding similarities and differences between populations. A summary of populations through clustering allows us to make inferences about the evolutionary history of the populations. Many methods have been proposed to perform clustering on multilocus genotype data. However, most of these methods do not directly address the question of how many clusters the data should be divided into and leave that choice to the user.METHODS: We present StructHDP, which is a method for automatically inferring the number of clusters from genotype data in the presence of admixture. Our method is an extension of two existing methods, Structure and Structurama. Using a Hierarchical Dirichlet Process (HDP), we model the presence of admixture of an unknown number of ancestral populations in a given sample of genotype data. We use a Gibbs sampler to perform inference on the resulting model and infer the ancestry proportions and the number of clusters that best explain the data.RESULTS: To demonstrate our method, we simulated data from an island model using the neutral coalescent. Comparing the results of StructHDP with Structurama shows the utility of combining HDPs with the Structure model. We used StructHDP to analyze a dataset of 155 Taita thrush, Turdus helleri, which has been previously analyzed using Structure and Structurama. StructHDP correctly picks the optimal number of populations to cluster the data. The clustering based on the inferred ancestry proportions also agrees with that inferred using Structure for the optimal number of populations. We also analyzed data from 1048 individuals from the Human Genome Diversity project from 53 world populations. We found that the clusters obtained correspond with major geographical divisions of the world, which is in agreement with previous analyses of the dataset.AVAILABILITY: StructHDP is written in C++. The code will be available for download at http://www.sailing.cs.cmu.edu/structhdp.CONTACT: suyash@cs.cmu.edu; epxing@cs.cmu.edu.
机译:动机:基因型数据的聚类是了解人群之间异同的重要途径。通过聚类总结种群,可以推断种群的进化历史。已经提出了许多方法来对多基因座基因型数据进行聚类。但是,这些方法中的大多数并不能直接解决将数据划分为多少个簇的问题,而只能由用户选择。方法:我们提供了StructHDP,这是一种从基因型数据自动推断簇数的方法。在外加剂存在下。我们的方法是对两个现有方法“结构”和“结构”的扩展。使用分层狄利克雷过程(HDP),我们可以在给定的基因型数据样本中对未知数量的祖先种群的混合存在进行建模。我们使用Gibbs采样器对生成的模型进行推断,并推断出最能解释数据的祖先比例和聚类数。结果:为了演示我们的方法,我们使用中性合并模拟了一个岛模型的数据。将StructHDP与Structurama的结果进行比较,显示了将HDP与结构模型结合的实用性。我们使用StructHDP分析了155个Taita鹅口疮画眉鸟Turdus helleri的数据集,之前已使用Structure和Structurama对其进行了分析。 StructHDP正确选择了最佳数量的种群以对数据进行聚类。基于推断祖先比例的聚类也与使用“结构”推断的最佳种群数量相符。我们还分析了来自人类基因组多样性项目的1048位个人的数据,这些数据来自53个世界人口。我们发现获得的聚类与世界上主要的地理区域相对应,这与先前对数据集的分析是一致的。可用性:StructHDP用C ++编写。该代码可从http://www.sailing.cs.cmu.edu/structhdp.CONTACT:suyash@cs.cmu.edu下载。 epxing@cs.cmu.edu。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号